Molecular clock fork phylogenies: closed form analytic maximum likelihood solutions.
نویسندگان
چکیده
Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees, but finding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization (EM) are used in order to find optimal parameters for a given tree. So far, analytic solutions were derived only for the simplest model-three-taxa, two-state characters, under a molecular clock. Quoting Ziheng Yang, who initiated the analytic approach,"this seems to be the simplest case, but has many of the conceptual and statistical complexities involved in phylogenetic estimation."In this work, we give general analytic solutions for a family of trees with four-taxa, two-state characters, under a molecular clock. The change from three to four taxa incurs a major increase in the complexity of the underlying algebraic system, and requires novel techniques and approaches. We start by presenting the general maximum likelihood problem on phylogenetic trees as a constrained optimization problem, and the resulting system of polynomial equations. In full generality, it is infeasible to solve this system, therefore specialized tools for the molecular clock case are developed. Four-taxa rooted trees have two topologies-the fork (two subtrees with two leaves each) and the comb (one subtree with three leaves, the other with a single leaf). We combine the ultrametric properties of molecular clock fork trees with the Hadamard conjugation to derive a number of topology dependent identities. Employing these identities, we substantially simplify the system of polynomial equations for the fork. We finally employ symbolic algebra software to obtain closed formanalytic solutions (expressed parametrically in the input data). In general, four-taxa trees can have multiple ML points. In contrast, we can now prove that each fork topology has a unique(local and global) ML point.
منابع مشابه
Analytic solutions of maximum likelihood on forks of four taxa.
This work deals with symbolic mathematical solutions to maximum likelihood on small phylogenetic trees. Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees, but finding the global optimum is a hard computational task. In this work, we give general analytic solutions for a family of trees with four taxa, two state characters, under a molecular...
متن کاملMaximum Likelihood Molecular Clock Comb: Analytic Solutions
Maximum likelihood (ML) is increasingly used as an optimality criterion for selecting evolutionary trees, but finding the global optimum is a hard computational task. Because no general analytic solution is known, numeric techniques such as hill climbing or expectation maximization (EM), are used in order to find optimal parameters for a given tree. So far, analytic solutions were derived only ...
متن کاملAnalytic Solutions for Three-Taxon MLMC Trees with Variable Rates Across Sites
We consider the problem of finding the maximum likelihood rooted tree under a molecular clock (MLMC), with three species and 2-state characters under a symmetric model of substitution. For identically distributed rates per site this is probably the simplest phylogenetic estimation problem, and it is readily solved numerically. Analytic solutions, on the other hand, were obtained only recently (...
متن کاملEvolution and host specificity in the ectomycorrhizal genus Leccinum
• Species of the ectomycorrhizal genus Leccinum are generally considered to be host specialists. We determined the phylogenetic relationships between species of Leccinum from Europe and North America based on second internal transcribed spacer (ITS2) and glyceraldehyde 3-phosphate dehydrogenase ( Gapdh ). • We plotted host associations onto the phylogenies using maximum likelihood and parsimony...
متن کاملEstimating phylogenies under maximum likelihood: A very large-scale neighborhood approach
A basic problem in evolutionary genetics is the estimation of phylogenies among DNA or protein sequences. This problem is known to be NP-hard under several optimality criteria used for evaluating the quality of phylogenies. Consequently, one can reasonably search for optimal phylogenies only for datasets of small sizes such that the ever increasing number of molecular data accumulating in publi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Systematic biology
دوره 53 6 شماره
صفحات -
تاریخ انتشار 2004